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When considering the problem of unmixing hyperspectral images, most of the literature in the 
geoscience and image processing areas rely on the widely acknowledged linear mixing model (LMM). 
However, in specific but common contexts, the LMM may be not valid and other nonlinear models 

should be invoked. Consequently, over the last few years, several significant contributions have been 

03 

Jj proposed to overcome the limitations inherent in the LMM. In this paper, we present an overview 

-I— > 

of recent advances that deal with the nonlinear unmixing problem. The main nonlinear models are 

introduced and their validity discussed. Then, we describe the main classes of unmixing strategies 

designed to solve the problem in supervised and unsupervised frameworks. Finally, the problem of 

J>~j detecting nonlinear mixtures in hyperspectral images is addressed. 
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I. Introduction 

Spectral unmixing (SU) is of prime interest for analyzing hyperspectral data, not only for remote 
sensing applications but also for planetary science and micro-spectroscopy. SU provides a compre- 
hensive and quantitative mapping of the elementary materials that are present in the acquired data. 

More precisely, SU consists of identifying the spectral signatures of these materials (usually called 
CO 
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endmembers) and estimating their relative contributions (or abundances) to the measured spectra. Due 

to the ill-posed nature, this is a challenging issue that has received considerable attention from the 

remote sensing, signal and image processing communities [1.]. Hyperspectral data analysis can be 

supervised, when the endmembers are known, or unsupervised, when they are unknown. Irrespective 

of the case, most of the existing spectral mixture analysis approaches require the definition of the 

mixing model underlying the sensed observations. A mixing model describes in an analytical fashion 

how the endmembers combine to form the mixed spectrum measured by the sensor. The abundances 

parametrize the model. Given the mixing model, SU boils down to inverse this formation process to 

infer the quantities of interest, namely the endmembers and/or the abundances, from the collected 

spectra. Unfortunately, defining the direct observation model that links these meaningful quantities to 

the measured data is not a trivial issue, and requires a thorough comprehension of complex physical 

phenomena. A model based on radiative transfer (RT) could accurately describe the light scattering 

by the materials in the observed scene, but would probably lead to very complex unmixing problems. 

Fortunately, invoking simplifying but sound assumptions can lead to exploitable mixing models. 

When the mixing scale is macroscopic and each photon reaching the sensor has interacted with 

just one material, the measured spectrum y p G ~M. L in the pth pixel can be accurately described by 

the following linear mixing model (LMM), 

R 

Yp = ^2 a r,P m r + % (1) 

r=l 

where L is the number of spectral bands, R is the number of endmembers present in the image, m r 
is the spectral signatures of the rth endmember, a TtP is the abundance of the rth material in the pth 
pixel and n p stands for an additive term associated with the measurement noise and the modeling 
error. Also, the abundances can be interpreted as the relative areas occupied by the materials in a 
given image pixel |2|. Thus it is natural to consider additional constraints regarding coefficients a r>p 

a rv > 0, Vp, Vr 

Jr~ (2) 

Yjv=l a r,P = !> V P 

In that case, SU can be formulated as a constrained blind source separation problem, or constrained 
linear regression, depending on the prior knowledge available regarding the endmember spectra. 

Due to the relative simplicity of the model and the straightforward interpretation of the analysis 
results, LMM-based unmixing strategies predominate in the literature dedicated to the SU problem. 
The first automated spectral unmixing algorithms, proposed in the 1990's and based on geometrical 
concepts, were designed to identify endmembers as purest pixels. The most popular algorithms are 
probably the pixel purity index, N-FINDR and vertex component analysis (VCA) (see (TJ for a 
comprehensive review of linear unmixing methods). It is worth noting that this class of algorithms 
does not explicitly rely on the assumption of pixels coming from linear mixtures. They only search 
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for endmembers as extremal points in the hyperspectral dataset, and this might indicate that some of 
these approaches can be still valid for nonlinear mixtures that preserve this property, provided there 
are pure pixels in the analyzed image. 

Conversely, another class of geometrical algorithms focuses on minimizing the volume of the 
simplex spanned by the endmembers. Typical examples of these algorithms are the minimum volume 
transform (MVT), minimum volume simplex analysis (MVSA) or the simplex identification via 
variable splitting and augmented Lagrangian (SISAL) JTJ. In that case, instances of endmembers as 
pure pixels are not required and the linearity assumption is implicitly invoked since the observations 
are assumed to lie inside the recovered simplex. 

A large family of unmixing methods is based on a generative model, explicitly describing the 
observed pixels as linear mixtures of unknown endmembers. Joint estimation of the endmembers 
and the abundances can be formulated as a nonnegative matrix factorization (NMF) problem, whose 
additional constraints can be handled in deterministic or statistical framework. Among these fully 
parametric methods, one can point to the iterative constrained endmembers (ICE) algorithm, the 
minimum volume constrained NMF (MVC-NMF) or Bayesian algorithms. 

All these techniques have been shown to be very useful whenever the LMM represents a good 
approximation to the actual mixing. There are, however, practical situations in which the LMM is not 
a suitable approximation JTJ. In these cases, more complex mixing models need to be considered. 

Several approximations to the RT theory have been proposed, such as Hapke's bidirectional model 
p). Unfortunately, these models require highly non- linear and integral formulations that hinder 
practical implementations of unmixing techniques. To circumvent these difficulties, several approx- 
imate but exploitable nonlinear mixing models have been recently proposed, and they have been 
receiving growing attention in the literature. This is due to their ability to capture important nonlinear 
effects that are inherent characteristics of hyperspectral images in several applications. They have 
proven essential to unveil meaningful information for the geoscience community. Developing effective 
unmixing algorithms based on nonlinear mixing models represents a challenge for the signal and image 
processing community. Supervised and unsupervised algorithms need to be designed to cope with 
nonlinear transformations that can be partially or totally unknown. Solving the nonlinear unmixing 
problem requires innovative approaches to existing signal processing techniques. 

This article proposes to review the state-of-the-art nonlinear unmixing techniques. The paper is 
organized as follows. The existing nonlinear mixing models will be presented in Section [IT| where 
their validity and intrinsic limitations are discussed. The principal nonlinear unmixing algorithms 



are presented in Section III Model-based and model-free algorithms are considered, and existing 



solutions for supervised and unsupervised unmixing are discussed. Section IV presents some recent 



strategies to detect nonlinear mixtures in hyperspectral data. Finally, challenges and future directions 
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for hyperspectral unmixing are reported in the concluding section. 

II. Nonlinear models 

In (TJ, it is explained that linear mixtures are reasonable when two assumptions are wholly fulfilled. 
First the mixing process must occur at a macroscopic scale J3J. Secondly, the photons that reach the 
sensor must interact with only one material, as is the case in checkerboard type scenes Q. An 
illustration of this model is depicted in Fig. [T] for a scene composed of two materials. When one of 
these two assumptions does not hold, different nonlinear effects may occur. Two families of nonlinear 
models are described in what follows. 




Fig. 1. Linear mixing model: the imaged pixel is composed of two materials. 



A. Intimate mixtures 

The first assumption for linear mixtures is a macroscopic mixing scale. However, there are common 
situations when interactions occur at a microscopic level. The spatial scales involved are typically 
smaller than the path length followed by the photons. The materials are said to be intimately mixed 
p). Such mixtures have been observed and studied for some time, e.g., for imaged scenes composed 
of sand or mineral mixtures (5J. They have been advocated for analyzing mixtures observed in the 
laboratory [6]. Based on RT theory, several theoretical frameworks have been derived to accurately 
describe the interactions suffered by the light when encountering surface composed of particles. 




Fig. 2. Intimate mixture: the imaged pixel is composed of a microscopic mixture of several constituents. 
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An illustration of these interactions is represented in Fig. [2] Probably the most popular are the 
approaches developed by Hapke in [2] since they involve meaningful and interpretable quantities 
which have physical significance. Based on these concepts, several simplified nonlinear mixing models 
have been proposed to relate the measurement to some physical characteristics of the endmembers 
and to their corresponding abundances, which, for intimate mixtures, are associated with the relative 
mass fractions. In |6], the authors derive an analytical model to express the measured reflectances 
as a function of parameters intrinsic to the mixtures, e.g., the mass fraction, the characteristics of 
the individual particles (density, size) and the single-scattering albeddj However this model also 
strongly depends on parameters inherent to the experiment since it requires the perfect knowledge of 
the geometric positioning of the sensor with respect to the observed sample. This dependency upon 
external parameters makes the inversion, i.e., the estimation of the mass fractions from the collected 
spectra, very difficult to implement. 

More generally, it is worth noting that this first requirement of linear mixtures is intrinsically 
related to the definition of the endmembers, and this definition may be application dependent. Indeed, 
defining a pure material requires specification of the spatial or spectral resolution, which is application 
dependent. Consider a simple scene composed of 3 materials A, B and C. It is natural to expect 
retrieval of these components individually when analyzing the scene. However, in other circumstances, 
one may be interested in the material components themselves, for instance, A\, A2, B\, B2, C\ 
and C2 if we assume that each material is composed of 2 constituents. In that case, each pair of 
subcomponents combine and, by performing unmixing, one might also be interested in recovering 
each of these 6 components. Conversely, it may be well known that the material A can never be present 
in the observed scene without the material B. In such case, unmixing would consist of identifying the 
couple A + B and C, without distinguishing the subcomponent A from the subcomponent B. This 
issue is frequently encountered in automated spectral unmixing. In each scenario, it is clear that more 
details are desired, the more the mixtures may not occur at a macroscopic scale. To circumvent this 
difficulty in defining the mixture scale, it makes sense to associate pure components with individual 
instances whose resolution has the same order of magnitude as the sensor resolution. For example, a 
patch of sand of spatially homogeneous composition can be considered as a unique pure component. 
In that case, most of the interactions occurring in most of the scenes of interest can be reasonably 
assumed to occur at a macroscopic level, at least when analyzing airborne and spaceborne remotely 
sensed images. 

'Hapke's model and related approximations are not reproduced here for conciseness. Interested readers are invited to 
consult |6| or the more signal processing-oriented papers (71, (8) to have an understandable description of these models. 
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B. Bilinear models 

Another type of nonlinear interaction is one that occurs at a macroscopic scale, in particular in so- 
called multilayered configurations. One may encounter this nonlinear model when the light scattered 
by a given material reflects off other materials before reaching the sensor. This is often the case for 
scenes acquired over forested areas, where there may be many interactions between the ground and 
the canopy. An archetypal example of this kind of scene is shown in Fig. [3] 




Fig. 3. Bilinear model: the imaged pixel is composed of two endmembers, namely tree and soil. In addition to the individual 
contribution of each material, bilinear interactions between the tree and the soil reach the sensor. 



Several models have been proposed to analytically describe these interactions. They consist of 
including powers of products of reflectance. However they are usually employed such that interactions 
of orders greater than two are neglected. The resulting models are known as the family of the bilinear 
mixing models. Mathematically, for most of these bilinear models, the observed spectrum y p G M 1 
in L spectral bands for the zth pixel is approximated by the following expansion 
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In the right-hand side of ([3]), the first term, also found in ([TJ, summarizes the linear contribution in the 
mixing while the second term models nonlinear interactions between the materials. The coefficient 
Pijjp adjusts the amount of nonlinearities between the components m^ and rrij in the pth pixel. 
Several alternatives for imposing constraints on these nonlinear coefficients have been suggested. In 
J9|, Nascimento and Dias assume that the (linear) abundance and nonlinearity coefficients obey 

a r ,p > 0, Vp, Vr 



Pi,j,p > 0, Vp, Vi ^ j 



(5) 
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It is worth noting that, from ([5]), this Nascimento model (NM), also used in 1 10 1, can be interpreted 



as a linear mixing model with additional virtual endmembers. Indeed, considering rrij rrij as a pure 
component spectral signature with corresponding abundance /3ij, p , the model in ([5) can be rewritten 

R 
y p = ^ 5 S) j,m s + n p 

8=1 

with the positivity and additivity constraints in ([2]) where 

_A A_ -, p 

a s , p — Pi,j, p , m s = mj0mj s = R+1,...,R 

and R = \R{R + 1). This NM reduces to the LMM when a SjP = for s = R + 1, . . . , R. 

Conversely, in [11], the authors have fixed the nonlinearity coefficients as functions of the (linear) 

abundance coefficients themselves: /3jj, p = ai :P aj :P (i ^ j). The resulting model, called the Fan 

Model (FM) in what follows, is thus fully described by the mixing equation 

R H-l R 

y p = '^2 a r>p m r + ^2 ^2 a i)P a j)P m.i m, + n p (6) 

r=l i=l j=i+l 

subject to the constraints in ([2]). One argument to explain the direct relation between the abundances 
and the nonlinearity coefficients is the following: if the ith endmember is absent in the pt\\ pixel, then 
a,i tP = and there are no interactions between rrij and the other materials nx, (j ^ i). More generally, 
it is quite natural to assume that the quantity of nonlinear interactions in a given pixel between two 
materials is directly related to the quantity of each material present in that pixel. However, it is clear 
that this model does not generalize the LMM, which can be a restrictive property. 

More recently, to alleviate this issue, the generalized bilinear model (GBM) has been proposed in 



|T2J by setting /%j iP = r/i,j, P ai, P aj, P 

R R-l R 

y p = ^a rjP m r + ^ ^ -fij, p a itP a jtP mi m,- + n p . (7) 

r=l «=1 j=i-\-l 

where the interaction coefficient 7j JiP G (0, 1) quantifies the nonlinear interaction between the spectral 
components rrij and nx,. This model has the same interesting characteristic as the NM: the amount 
of nonlinear interactions is governed by the presence of the endmembers that linearly interact. In 
particular, again, if an endmember is absent in a pixel, there are no nonlinear interactions supporting 
this endmember. However, it also has the significant advantage of generalizing both the LMM when 
li,j,p = an d the NM when 7ij p = 1. Having 7y, p > indicate that only constructive interactions 
are considered. 

For illustration, synthetic mixtures of R = 3 spectral components have been randomly generated 
according to the LMM, NM, FM and GBM. The resulting data set are represented in the space 
spanned by the three principal eigenvectors (associated with the three largest eigenvalues of the sample 

April 9, 2013 DRAFT 



covariance matrix of the data) identified by a principal component analysis in Fig. |4] These plots 
illustrate an interesting property: the spectral signatures of the pure components are still endmembers, 
i.e., vertices of the clusters, in the cases of FM and GBM mixtures contrary to the NM. In other words, 
endmember extraction algorithms based on geometrical interpretations, i.e., those that are looking for 
the simplex of largest or smallest volume (see (Tj for details), can still be valid for the FM and the 
GBM. 
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Fig. 4. Clusters of observations generated according to the LMM, the NM, the FM and the GBM (blue) and the 
corresponding endmembers (red). 



All these bilinear models only include between-components interactions rrij uij with i ^ j 



but no within-components interactions rrij mj. However, in [13|, the authors derive a nonlinear 
mixing model thanks to a thorough physical analysis of a simple canyon-like urban scene. Successive 
approximations and simplifying assumptions lead to the following linear-quadratic mixing model 
(LQMM) 

R R R 

y p = ^J a riP m r + ^2 ^2 f 3i d,p m i © m i + n p ( 8 ) 

r=l i=l j=i 

with he positivity and additivity constraints in ([2]) and Pij >p 6 (0, 1). This model is similar to the 
FM in ([6]), with the noticeable difference that the nonlinear contribution includes quadratic terms 
rrij rrij. 
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C. Other approximating physics-based models 

To describe both macroscopic and microscopic mixtures, [14] introduces a dual model composed 



of two terms 



R f R \ 

^ a r , p m r + a R+ i tP Tl ^ / r?p w r + n p . 

r=l \r=l / 



The first term is similar to the one encountered in LMM and comes from the macroscopic mixing 
process. The second one, considered as an additional endmember with abundance or+i^, describes 
the intimate mixture by the average single-scattering albedo expressed in the reflective domain by the 
mapping TZ(-). 



Altmann et al. have proposed in [15| an approximating physics-based model able to describe a 
wide class of nonlinearities. This model is derived by performing a second-order expansion using a 
post-nonlinear mixing model (PPNMM). More precisely, the pth observed pixel spectrum is defined 
as a nonlinear transformation g p (•) of a linear mixture of the endmember spectra 

y P = gp 7" ar, p va T + n p (9) 



I ^ a r , p m r J + n p 



where the nonlinear function g p is defined as a second order polynomial nonlinearity parameterized 
by the unique nonlinearity parameter b p 

g p : (0,1) L ^R L 



T (10) 

x H> [xi + b p xj, ...,xl + b p x 2 L ] 

This model can be rewritten 

y p = Ma p + b p (Map) (Map) + n p 

where M = [mi , . . . , hir] and a p = [oi )P , . . . , Qr, p ] • The parameter b p tunes the amount of 
nonlinearity present in the pth pixel of the image and this model reduces to the standard LMM 
when b p = 0. Moreover, it can be easily shown that this polynomial post-nonlinear model (PPNM) 
includes bilinear terms rrij rrij (i ^ j) similar to those defining the FM, NM and GBM, as well 
as quadratic terms rrij m, similar to the LQMM in ([8]). This PNMM has been demonstrated to be 
sufficiently flexible to describe most of these bilinear models p3| . 

D. Discussion 

Having reviewed the above physics-based models, an important remark must be made. It is im- 
portant to note that these models do not take into account interactions from materials present in the 
neighborhood of the targeted pixel. This means that these bilinear models only consider scattering 
effects in a given pixel induced only by components that are present in this pixel. This is, naturally, a 
strong simplifying assumption that allows, at the end, the model parameters (abundance and nonlinear 
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coefficients) to be estimated pixel-by-pixel in the inversion step. Note however that the problem of 
taking into account adjacency effects, i.e., nonlinear interactions coming from spectral interference 



caused by atmospheric scattering, has been addressed in an unmixing context in [16|. 



III. Nonlinear unmixing algorithms 

Significant and promising contributions have been proposed to nonlinearly unmix hyperspectral 
data. A wide class of nonlinear unmixing algorithms rely explicitly on a nonlinear physics-based 
parametric model, as detailed in Section |n] Others do not require definition of the mixing model 
and rely on very mild assumptions regarding the nonlinearities. These two classes of algorithms, 
considered under the two unmixing scenarios (supervised vs. unsupervised) are described in this 
section. 

A. Model-based parametric nonlinear unmixing algorithms 

Given a nonlinear parametric model, SU can be formulated as a constrained nonlinear regression or 
a nonlinear source separation problem, depending on whether the endmember spectral signatures are 
known or not. When dealing with intimate mixtures, some authors propose converting the measured 
reflectance into a single scattering albedo average. Since this latter obeys a linear mixture, the 
mass fractions associated with each endmember can be estimated using a standard linear unmixing 
algorithm. This is the approach adopted in (6} and |7J. To avoid the painful conversion of the 
reflectance measurements, which would require the use of look-up tables, a common approach widely 
advocated consists of resorting to neural-networks (NN) to learn the nonlinear function. This is the 



strategy followed by Guilfoyle et al. in [17|, for which several improvements have been proposed 



in [18 1 to reduce the computationally intensive learning step. Other NN-based algorithms include 



1 19|-[ 221. 



For the bilinear models introduced in Section [TTj supervised nonlinear optimization methods have 
been developed. When the observed pixel spectrum y p is related to the parameters of interest p (a 
vector containing the abundance coefficients as well as any other nonlinearity parameters) through 
the function <£>(•), unmixing the pixel y p consists of solving the following minimization problem 

6 p = a,rgmm\\y p -<p(M;0)\\ 2 2 (11) 

e 

This problem raises two major issues: i) the nonlinearity of the criterion resulting from the underlying 
nonlinear model ip(-) and ii) the constraints the parameters in are subjected to. Since the NM can be 
interpreted as a linear mixing model with additional virtual endmembers, estimation of the parameters 
are conducted with a linear optimization method in J9J. In [ 11 1, (23j dedicated to FM and GBM, the 



authors propose to linearize the objective criterion via a first-order Taylor series expansion of ip(-). 
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Then, the fully constrained least square (FCLS) algorithm in |24| can be used to estimate the vector of 
parameters 0. An alternative algorithmic scheme proposed in |23j consists of resorting to a gradient 
descent method where the step-size parameter is adjusted by a constrained line search procedure to 



ensure the constraints inherent to the mixing model. An alternative initially introduced in |12| for 
GBM is based on Monte Carlo approximations, developed in a fully Bayesian statistical framework. 
The Bayesian setting has the great advantage of providing a convenient way to include the parameter 
constraints within the estimation problem, by defining appropriate priors for the parameters. This 



strategy has been also implemented to conduct PNMM-based unmixing in |15|. 

When the spectral signatures involved in these bilinear models need also to be identified with 
the abundances and nonlinear coefficients jointly, more ambitious unmixing algorithms need to be 
designed. In [25], the authors differentiate the NM to implement updating rules that generalizes 
the SPICE algorithm introduced in [26] to solve the linear mixing model. Conversely, NMF-based 
iterative algorithms have been advocated in [27] for the GBM, and in [13] for the LQMM described 
in ([8]>. Adopting a more geometrical point-of-view, Heylen and Scheunders propose in |28| an integral 
formulation to compute distances on the geodesic induced by the GBM. The underlying idea is to 
derive an endmember extraction algorithm that identifies the simplex of maximum volume inscribed 
on the manifold defined by the pixels. 

B. Model-free nonlinear unmixing algorithms 

When the nonlinearity is unknown, the problem becomes even more challenging. In such cases, 
a possible approach is to use a geometrical-based unmixing technique that relies on graph-based 
approximate geodesic distances [|29|, or manifold learning techniques [30}, J3TJ. Methods based on 



support vector machine are also detailed in [32 1, |33|. Another promising approach is to resort to 
nonparametric methods based on Gaussian processes [34|, or on reproducing kernels [35|-[40|, to 
approximate the unknown nonlinearity. These two later techniques are described in what follows. 

Nonlinear algorithms operating in reproducing kernel Hilbert spaces (RKHS) have been a topic of 
considerable interest in the machine learning community, and have proved their worth in solving non- 
linear problems. Kernel-based methods have been widely considered for detection and classification 
in hyperspectral images. Surprisingly, nonlinear unmixing approaches operating in RKHS have been 



investigated in a less in-depth way. The algorithms derived in |35|, |36| were mainly obtained by 
replacing each inner product between endmember spectra, in the cost functions to be optimized, by 
a kernel function. This can be viewed as a nonlinear distortion map applied to the spectral signature 
of each material, independently of their interactions. This principle may be extremely efficient in 
solving detection and classification problems as a proper distortion can increase the detectability or 
separability of some patterns. It is however of little physical interest in solving the unmixing problem 
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because the nonlinear nature of mixing is not only governed by individual spectral distortions, but 
also by nonlinear interactions of the materials. In J37J, a new kernel-based paradigm was proposed to 
take the nonlinear interactions of the endmembers into account. It consists of considering the problem 

L 
VV(-) =argminV[y£,p-V'(mAj] 2 + vW^Wu (12) 

where m\ t stands for the vector of the endmember signatures at the £-th frequency band, namely, 
m\ e = [ma, . . . ,me } n\ with H a given functional space, and /i a positive parameter that controls 



the trade-off between regularity of the function ?/>(•) and fitting. It is interesting to note that ( |T2| ) is the 
functional counterpart to ( [TT] ), where i/j(-) defines the nonlinear interactions between the endmembers. 
Clearly, this strategy may fail if the functional space 7-L is not chosen appropriately. A successful 
strategy is to define 7i as an RKHS in order to exploit the so-called kernel trick. Let «:(•,•) be the 
reproducing kernel of 7~L. The RKHS 7i must be carefully selected via its kernel in order to make it 
flexible enough to capture wide classes of nonlinear relationships, and to reliably interpret a variety 
of experimental measurements. In order to extract the mixing ratios of the endmembers, the authors 



in 1 37 1 focus their attention on partially linear models. More precisely, the function ijj(-) in problem 
( 12 1 is defined by an LMM parameterized by the abundance vector a, combined with a nonparametric 
term, that is, 

ip{m. Xl ) = a T m A{ + Vmin(m A J (13) 

possibly subject to the constraints in ([2]), where -0niin can be any real-valued function of an RKHS 
denoted by 7-L a \{ a . This model generalizes the standard LMM, and mimics the PPNM when % n ii n is 
defined to be the space of polynomial functions of degree 2. Remember that the latter is induced 
by the polynomial kernel n{m\ n m\ e , ) = (mj rriA f , ) q of degree q = 2. More complex interaction 
mechanisms can be considered by simply changing n(m\ e , m\ fl ). By virtue of the reproducing kernel 
machinery, the problem can still be solved in the framework of ( fT2] l. 



A number of interesting variants of the problem ([T2j) can be constructed to incorporate prior 
information. In |37| for instance, the authors propose to substitute the regularization term ^||VHI^ by 



1 - -2 , 1 „„, |,2 



M ~|a|r+- HV'nlmllw (14) 

\u 1 — u I 



with < u < 1. Problem ([T2J) can be solved efficiently with respect to the partially linear model ( |T3| ) 
and u because it is convex. This regularization allows the automatic adjustment of the balance between 
the LMM and the nonlinear mixing submodel Vniin( - )> which paves the road to nonlinear mixture 
detectors described hereafter. Regularization can also be used for integrating spatial information into 



the hyperspectral unmixing problem. In [38], the authors have proposed using ^i-type regularizer of 



the form [41 1 to promote piecewise-constant transitions in the fractional abundances that parameterize 
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the partially linear model (13 1. The regularizer incorporated in problem (12i is expressed as 



7 1 ^ H a P~ a glli ( 15 ) 

ge/v"(p) 

where || ||i denotes the vector ^i-norm, and M(q) the set of neighbors of the pixel p. The optimization 



problem ( 12 1 remains convex but not smooth with respect to ( fj"5| ). A split-Bregman iteration algorithm 



can be used to overcome this drawback. 

Another strategy by Altmann et al. considers a kernel-based approach for nonlinear SU based on 
a nonlinear dimensionality reduction using a Gaussian process latent variable model (GPLVM) [34]. 
The main advantage of GPLVMs is their capacity to accurately model many different nonlinearities. 
GPLVMs construct a smooth mapping from the space of fractional abundances to the space of observed 



mixed pixels that preserves dissimilarities. This strategy is also considered in |40| by Nguyen et 



al., who solve the so-called pre-image problem |42| studied by the machine learning community. 



In the spectral unmixing context, it means that pixels that are spectrally different have different 
latent variables and thus different abundance vectors. However, preserving local distances is also 
interesting: spectrally close pixels are expected to have similar abundance vectors and thus similar 
latent variables. Several approaches have been proposed to preserve similarities, including back- 
constraints and locally linear embedding. In (3*4") , the authors have used a particular form of kernel 
which extends the generalized bilinear model in ("""J. The proposed algorithm is unsupervised in the 
sense that the endmembers contained in the image and the mixing model are not known. Only the 
number of endmembers is assumed to be known. As a consequence, the parameters to be estimated 
are the kernel parameters, the endmember spectra and the abundances for all image pixels. 

IV. Detecting nonlinear mixtures 

Consideration of nonlinear effects in hyperspectral images can provide more accurate results in 
terms of endmember and abundance identification. However, working with nonlinear models generally 
requires a higher computational complexity than approaches based on the LMM. Thus, unmixing 
linearly mixed pixels using nonlinear models should be avoided. Consequently, it is of interest 
to devise techniques to detect nonlinearities in the mixing process before applying any unmixing 
method. Linearly mixed pixels can then be unmixed using linear unmixing techniques, leaving the 
application of more involved nonlinear unmixing methods to situations where they are really necessary. 
This section describes approaches that have been recently proposed to detect nonlinear mixing in 
hyperspectral images. 

A. Investigation using surrogate data methods 

A possible approach to test for nonlinear mixing in hyperspectral images is to consider the 
components of the observation vector as data series and employ existing time series analysis techniques 
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for nonlinearity detection. Recently surrogate data methods were proposed as a complement to the 



investigation of nonlinearity in hyperspectral imagery [43|. The general procedure of surrogate data 
was proposed in the early 1990's [44]. One first assumes that the observed data to be tested come 
from a specific hypothetical process that characterizes a null hypothesis TLq. An ensemble of surrogate 
data sets are then generated which satisfy Tio and share given properties of the observed data. A 
discriminating statistic is then computed for each surrogate data set and for the observed data. The 
hypothesis TLq is rejected if the value of the test statistics for the observed data is not consistent with 
those provided by the surrogate data for a specified significance level. The key to success in applying 
the surrogate data method are the proper generation of the surrogate data and specification of the 
discriminating statistics. 



The surrogates in |43 1 were generated using a technique called improved amplitude adjusted Fourier 
transform (AAFT) |45|, which tries to match the distributions and the power spectra iteratively and 
reportedly leads to less false rejections than the standard AAFT. Two different discriminating statistics 
were tested on the available data obtained from a 4-m AVIRIS image: the third order moment of the 
difference between adjacent samples and the spectral angle between these samples. The conclusions 



of this study indicate that the results of tests using surrogate data methods as applied in 1 43 1 may be 
incorporated with other supporting evidence to decide about the linearity of the mixing of endmember 
contributions in hyperspectral images. 

B. Detection using a polynomial post-nonlinear mixing model (PPNMM) 

One interesting approach for nonlinearity detection is to assume a parametric nonlinear mixing 
model that can model different nonlinearities that could affect the mixing of the endmember contri- 
butions to the pixel hyperspectral observations. A model that has been successfully applied to this 
end is the post-nonlinear mixing model (PPNMM) studied in |15|, po*) . 



PPNMM assumes the post-nonlinear mixing described in ([9]) with the polynomial nonlinearity 



g p defined in (TTOj). Hence, the nonlinearity is characterized by the parameter b p for each pixel in 
the scene. This parameter can be estimated in conjunction with abundance vector a p and the noise 
variance a 2 Defining s 2 (a p , b p , a 2 ) as the variance of the estimator b p of b p , and using the properties 
of the maximum likelihood estimator, it makes sense to approximate the distribution of b p by the 
following Gaussian distribution 

b p ~ N (b p , s 2 (a p , b p , a 2 )) . 

The nonlinearity detection problem has the two hypotheses 

Tio '■ Yp is distributed according to the LMM ([T]) 

(16) 
Til ■ y P is distributed according to the PPNMM (|9]> 
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Hypothesis 1~Lq is characterized by b p = whereas nonlinear models (H\) correspond to b p ^ 0. 
Then, (fTo} can be rewritten as 

v oj (17) 

: b p ~ 7V(6 P , sf ) 

where Sg = s 2 (a p , 0, <r 2 ) and sf = s 2 (a p , 6 p , a 2 ) with 6 P / 0. 

Detection can be performed using the generalized likelihood ratio test. The test strategy is given 
by 

T=^^ V . (18) 

s o n 




As shown in [46|, the statistic T is normally distributed under the two hypotheses. Consequently, the 
threshold rj can be explicitly related to the probability of false alarm (PFA) and the probability of 
detection (PD), i.e., the power of the test. 

However, this detection strategy assumes the prior knowledge of the variances Sq and s\. If this is 



not the case, Altmann et al. show in 1 46 1 that the test strategy can be modified to 



- bin, 

T=^^ V *. (19) 

s o n„ 



where the estimate Sq can be calculated as 



^ = CCRLB(0;a p ,(T 2 ) (20) 



where CCRLB is the constrained Cramer-Rao lower-bound |47| on estimates of the parameter vector 
6 = [aj, bp, (7 2 ] T under Hq, and a p (resp. a 2 ) is the MLE of a p (resp. a 2 ). The performance of the 
resulting test is illustrated in Fig. [5] which shows the pixels detected as nonlinear when generated 
according to various mixing models (LMM, FM, GBM and PNMM). 

C. Robust model-free detection 

The detector discussed in the previous section assumes a specific nonlinear mixing model under 
the alternative hypothesis. There are, however, situations where the actual mixing does not obey any 
available model. It is also possible there is insufficient information to opt for any existing nonlinearity 
model. In these cases, it is interesting to address the problem of determining whether an observed 
pixel is a linear function of endmembers or results from a generic nonlinear mixing. 

One may consider the LMM ([TJ and the hyperplane V defined by 



V : 



lz p z p = Map, ^2 a r,p = 1 \- ( 21 ) 
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Fig. 5. Pixels detected as linear (red crosses) and nonlinear (blue dotted) for the four subimages generated according the 
LMM, FM, GBM, and PNMM. Black lines depict the simplex corresponding to the noise-free case LMM. 



In the noise-free case, the hyperplane V lies in an (R — 1) -dimensional subspace embedding all 
observations distributed according to the LMM. On the other hand, consider the general nonlinear 
mixing modejj 

y p = Ma p + fi p + n p (22) 

where /i p is an L x 1 deterministic vector that does not belong to V, i.e., /x p ^ V and a p satisfies 
the constraints ([2]). Note that fi can be a nonlinear function of the endmember matrix M and/or the 



abundance vector a p and should be denoted as /i p (M,a p ) |49|. However, the arguments M and a p 



are omitted here for brevity. Given an observation vector y p , the detection of nonlinear mixtures can 
be formulated as the following binary hypothesis testing problem 

%o : Yp is distributed according to the LMM ([TJ 
Hi : y p is distributed according to the model ( [22] ) . 
Using the statistical properties of the noise n p yields that E[y p |%o] = Ma p G V whereas E[y p |%i] = 
Ma p + /x p ^ V. As a consequence, it makes sense to consider the squared Euclidean distance 



6 2 (y P ) 



mm||y p 
zeP 



(23) 



Note that a similar nonlinear mixing model coupled with a group-sparse constraint on /i has been explicitly adopted 
in 1 48 1 to make more robust the unmixing of hyperspectral pixels when only a few of them are assumed to come from 
nonlinear mixtures. 



April 9, 2013 



DRAFT 



17 



between the observed pixel y p and the hyperplane V to decide which hypothesis (Hq or %\) is true. 



As shown in [49|, the quantities S 2 (y p ) are distributed according % 2 distributions under the two 
hypotheses Ho and T-L\. The parameters of these distributions depend on the known matrix M, the 
noise variance a 2 and the nonlinearity vector /j, . If a 2 is known, the distribution of S 2 (y p ) is perfectly 
known under T-Lq and partially known under %i. In this case, one may employ a statistical test that 
does not depend on /x , such as 

T = — ^ ^ rj. (24) 



As in the detection procedure detailed above IV-B the threshold 77 can be related to the PFA and PD 
in closed-form expressions. 



If the noise variance a is unknown, which is the case in most practical applications, one may 



replace a in (24 ) with an estimate a . This yields the test 



T* = -^ % T) (25) 

° Ho 



where 77 is the threshold computed as previously. The PFA and PD of the test < \25\ are then explicitly 
obtained using cumulative distribution functions of the \ 2 distribution. It was shown in [49] the better 
the estimation of a 2 , the closer the distributions of T and T* and thus the closer the performances 
of the tests (24i and (25 1. Several techniques can be used to estimate a 2 . For instance, a 2 has 
been estimated in [49] through an eigen-analysis of the sample covariance matrix of a set of pixels 
assumed to share the same variance. The value of a 2 was determined as the average of the smallest 
eigenvalues of the sample covariance matrix. The accuracy of the estimate is a function of the number 



of eigenvalues considered. It was shown in |49| that a PFA smaller (resp. larger) than Pp A is obtained 
if a 2 > a 2 (resp. a 2 < a 2 ). 

V. Conclusion 

To overcome the intrinsic limitations of the linear mixing model, several recent contributions have 
been made to the modeling of the physical processes that underly the hyperspectral observations. 
Some models are physically based, and they attempt to account for between-material interactions 
the photons may be subjected to before reaching the spectro-imager. Based on these models, several 
parametric algorithms have been proposed to solve the resulting nonlinear unmixing problem. Another 
class of unmixing algorithms attempts to avoid the use of any rigid nonlinear model by resorting 
to nonparametric machine learning-inspired techniques. The price to pay for handling nonlinear 
interactions induced by multiple scattering effects or intimate mixtures is the computational complexity 
and a possible degradation of unmixing performance when processing large hyperspectral images. To 
overcome these difficulties, one possible strategy consists of detecting pixels subjected to nonlinear 
mixtures in a pre-processing step. Detected linearly mixed pixels can then benefit from the huge 
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and reliable literature dedicated to the linear unmixing problem. The remaining pixels, detected as 
nonlinear, can then be the subject of particular attention. Finally, this paper demonstrates that the 
nonlinear unmixing problem represents an exciting issue that requires the expertise from signal and 
image processing researchers with various methodological backgrounds. 
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